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Abstract 

A mixture of discrete Binomial distributions (MDBD), denoted by Bn,t(&,p), is a set of pairs 
N),ai) }r_i, where £?(•,•) denotes the Binomial distribution, all pi £ (0,1) are distinct, 
YlJLi a i ^ 1 and all a i £ (0,1). A vector 7 is induced by MDBD if 7 \ = X)J =1 ctj • f° r 

all i £ [0 : N], where B/v,i(p) = ( r ?)p t (l — p) N ~ * l - 

We prove for “large” class C of continuous probability density functions (p.d.f.), that for 
every w £ C there exists MDBD with T ^ N^/<p w /5 that (^-approximates a discretized p.d.f. 
w(i/N) = w(i/N)/[£2^_ 0 w(£/N)] for all* £ [3 : N— 3], where (f> w ^ max xe [ 0jl ] |ui(x)|. Moreover, 
we propose an efficient parallel algorithm that on input p.d.f. w £ C and parameter 5 > 0, 
outputs MDBD that induces a vector 7 which (5-approximates w. Also, we give an efficient 
parallel algorithm that on input a discretized p.d.f. w induced by MDBD with T = N + 1 and 
the corresponding vector p, outputs exactly the coefficients a. 

Cheng et al. [ ] proposed the first sequential algorithm that on input a discretized p.d.f. /?, 
B = D — M that is either Laplacian or SDDM matrix and parameter e £ (0,1), outputs in time 
0{e~’ 2 mN' 2 ) 1 a spectral sparsifier of a matrix-polynomial D — Mn D — D y ) i=0 /3i(D~ 1 M)\ 
However, given MDBD Bn,t{oi,p) that induces a discretized p.d.f. 7 , to apply the algorithm 
in [4] one has to explicitly precompute in O(NT) time the vector 7 . Instead, we give two 
algorithms (sequential and parallel) that bypass this explicit precomputation. 

We propose a faster sequential algorithm that on input MDBD Bn,t{ol,p) with N = 2 k 
for k £ N-t- outputs in 0(e~ 2 m + £~ 4 nT) time the desired spectral sparsifier. Moreover, our 
algorithm is parallelizable and runs in 0{e~ 2 m + e~ A nT) work and 0(log N ■ poly(log n) + log T) 
depth. Our main algorithmic contribution is to propose the first efficient parallel algorithm that 
on input continuous p.d.f. w £ C, matrix B = D — M as above, outputs a spectral sparsifier of 
matrix-polynomial whose coefficients approximate component-wise the discretized p.d.f. w. 

Our results yield the first efficient and parallel algorithm that runs in nearly linear work 
and poly-logarithmic depth and analyzes the long term behaviour of Markov chains in non¬ 
trivial settings. In addition, we strengthen the Spielman and Peng’s [21] parallel SDD solver by 
introducing a simple parallel preprocessing step. 


‘This work has been funded by the Cluster of Excellence “Multimodal Computing and Interaction” within the 
Excellence Initiative of the German Federal Government. 

1 0(-) notation hides poly(log n, log N) factors. 
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1 Introduction 


In their seminal work Spielman and Teng [26] introduced the notion of spectral sparsifiers and 
proposed the first nearly linear time algorithm for spectral sparsification. In consecutive work, 
Spielman and Srivastava [24] proved that spectral sparsifiers with 0(s~ 2 n log n) edges exist and 
can be computed in 0(m\og c n ■ log {w max /w m i n )) 2 time for any undirected graph G = ( V,E,w ). 
The computational bottleneck of their algorithm is to approximate the solutions of logarithmically 
many SDD 3 systems. 

Recently, Koutis, Miller and Peng [15] developed an improved solver for SDD systems that works 
in 0(m log n • log(l/e)) time. In a survey result [ 3, Theorem 3] Kelner and Levin showed that in 
0(mlog 2 n) time all effective resistances can be approximated up to a constant factor. This yields 
a (1 ± e)-spectral sparsifier with only a constant factor blow-up of non-zero edges 0(e~ 2 n log n). 
Although there are faster by a poly log-factor sparsification algorithms [ ] they output spectral 

sparsifiers with poly log-factor more edges. 

Spielman and Peng [21] introduced the notion of sparse approximate inverse chain of SDDM 4 
matrices. They proposed the first parallel algorithm that finds such chains and runs in work 
O[m log 3 n • log 2 k) and depth O (log c n • log k) , where k is the condition number of the SDDM matrix 
with m non-zero entries and dimension n. Furthermore, they showed that in 0(e^ 2 m log 3 n ) time a 
spectral sparsifier D — A ~ £ D — AD~ 1 A can be computed with nnz(A) ^ 0(s~ 2 n log n). In a follow 
up work, Cheng et al. [3] designed an algorithm that computes a sparse approximate generalized 
chain C such that CC T ~ £ M p for any SDDM matrix M and \p\ ^ 1. The chain C is constructed 
iteratively and it involves a normalization step that produces a sparsifier D— M,+i ~ e D—MiD^ 1 Mi 
that is expressed in terms of the original diagonal matrix D, for all iterations i. 

Sinclair and Jerrum [23] analyzed Markov chains with transition matrices W = [I + D~ 1 A\/ 2, 
corresponding to lazy random walks. They proved that these walks converge fast to stationary 
distribution, defined by ir u = d u /(^ u&v d u ), after 0(4>q log(min ug y 7R 1 )) 5 steps. Andersen et 
al. [ ] gave an efficient local clustering algorithm that relies on a lazy variation of PageRank, the 
transition matrix of which is defined by ^“ 0 a (l ~ a) t W t , where a > 0 is a parameter. Their 
local algorithm uses a truncated (finite summation) version of the preceding transition matrix. 

Recently, Cheng et al. [ ] initiated the study of computing spectral sparsifiers D — A « e 
D — D £,i(D~ 1 A) 1 of random walk Laplacian matrix polynomials, where £ is a probability 

distribution over [1 : N], D — A is a Laplacian matrix and £i{D ^ 1 A)* is a random walk 

transition matrix. These matrix polynomials capture the long term behaviour of Markov chains. 
Moreover, a sparsifier of a matrix polynomial yields a multiplicative approximation of the expected 
generalized “escaping probability” [10, 1 ] of random walks. Cheng et al. [ ] gave the first sequen¬ 
tial algorithm that computes a spectral sparsifier of a random walk Laplacian matrix polynomial 
and runs in time 0(e ~ 2 • mN 2 ■ log C1 n • log C2 N) for some small constants ci, C 2 - 

2 Our Results 

The lazy random walk length N in the regime of interest in [1, 10, 23] is of order N = ©(poly(n)). 
The quadratic runtime dependance on N makes the algorithm in [4] prohibitively expensive for 
analysing the long term behaviour of Markov chains. In this paper, we overcome this issue for 

2 The O(-) notation hides 0(poly(log log n)) factors. 

3 SDD is the class of symmetric and diagonally dominant matrices. 

4 SDDM is the class of positive definite SDD matrices with non-positive off-diagonal entries. 

s Graph conductance </>g = min M ( S )^ M (y)/ 2 4>(S), where 4>{S) = |w(5, S)\/n(S) and /r(5) = d u - 
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“large” class of probability distributions 7 over [0 : N] that are induced by mixture of discrete 
Binomial distributions (MDBD) with N = 2 k for k € N+. Our results are summarized as follows. 

In Subsection 2.1, we analyze the representational power of MDBD. In Subsection 2.2, we give 
a sequential and a parallel algorithm for computing a spectral sparsifier of matrix polynomials 
induced by MDBD. In Subsection 2.3, we propose the first parallel algorithm that runs in nearly 
linear work and poly-logarithmic depth and analyzes the long term behaviour of Markov chains in 
non-trivial settings. In Subsection 2.4, we strengthen the Spielman and Peng’s [21] parallel SDD 
solver. 


2.1 Representational Power of MDBD 

Let B(p , N) be Binomial distribution for some parameters p E (0,1) and N € N+. MDBD is a set 
of pairs {(B(pi, N), denoted by Bn,t(&iP), that satisfies the following two conditions: 

1. (distinctness) pi E (0,1) and pi / pj for all i 7 ^ j E [1 : T]; 

2. (positive linear combination) JT =1 a i ^ 1 and ctj € (0,1) for all i E [1 : T\. 

We prove in Section 9 that for every function w in a “large” class of continuous p.d.f., there 
exists MDBD that induces a component-wise approximation of w. 

Theorem 2.1 (MDBD Yields a Component-Wise Approximation). Let w(x) be a four times dif¬ 
ferentiable p.d.f., si > 0 a parameter and I = [0,1] an interval. Suppose there is an integer Nq E N 
and reals p E (0,1) and <f w ^ 1 such that: 

1) max xe/ \w"(x)\ < 2(f w ■ N$, 2) max xe / |u/(x)| ^ \<j) w ■ N 0 , 3) max xe / |ie(x)| < f> w , 

4) ma x xeI \b 2 {x)\ < \p ■ N% , 5) max xe/ \h(x)\ < \p ■ N 0 , 

where the functions 61,62 are defined by b\(x) = [— w(x) + (1 — 2 x)w'(x) + ^x(l — x)w"(x)\ and 

b 2 {x) = ^Ly[u;(x)—3(1— 2 x)w' (x)+(l—Qx-\-Qx 2 )w" (x)+^x{l—x){l— 2 x)w’" {x)+^x 2 {I—x) 2 w' v {x)\. 

Then for every N ^ Nq, any T ^ Q(Ny/cj> w /si) and all i E [3 : N — 3] there is ry E [—p,p] such 
that 

a- + 1}) 

3 = 1 

where F{(x ) = (w(x)/[T + 1]) • Bjsr^x) and Bny{x) = (^)x*(l — x) N ~ l . 

For every function w that satisfies the hypothesis in Theorem 2.1 we associate MDBD Bn.T(a,p) 
that is defined by pj = j/(T + 1) and 07 = w(pj)/(T + 1) for all j E [1 : T]. Moreover, in Section 10 
we give an efficient parallel algorithm that on input a continuous p.d.f. w (satisfying the conditions 
in Theorem 2.2) and integer N E N+, outputs MDBD that induces a discretized p.d.f. which 
approximates component-wise a desired discretized p.d.f. w. 


<£l 

^ N ’ 


( 1 ) 


Theorem 2.2 (An Efficient Parallel Algorithm for Finding MDBD). Let w(x) = C■ f(x) be a p.d.f. 
that satisfies the hypothesis of Theorem 2.1. Suppose also a) 0 ^ f(x) ^ 1, 6) |[/(0) + /(1)] ^ 
P(l), c) 1 ^ C ^ o(N), and d) J^ \f ( - 2 \x)\ dx ^ o(N). Then there is a parallel algorithm 
AppDscrPDF that on input w(x) as above, integer N E N + and parameter sj > 0, outputs 
in 0(Ny/cj> w /ei) work and (){\og{NS])) depth MDBD R/v,x(a:,p) that induces a discretized 
probability distribution y/[l — 5^] over [0 : N] such that for all i E [3 : N — 3] 


7i 

1 -s w 


€ 


(1 + 2r]i) ■ w(i/N) ± 


2 si 


Sn+i,n 
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where the target discretized p.d.f. is defined by w(i/N ) = w(i/N)/S]y+i,N for all i E [0 : N], 
<5w = 1 - ST sI + + \ / ^/n 1) > s N+i,N = T,j=o w (j/ N ) and S t,t+ l = Tl=i w U/[ T + !])■ Moreover, it 
holds that 5 W E [0,o(l)]. 

In Appendix D, we illustrate the representational power of MDBD by applying Theorem 2.2 for 
two canonical continuous p.d.f.: the Uniform distribution and the Exponential Families. 

Exact Recovery Interestingly, in case when a discretized p.d.f. w is induced by MDBD p) 

with exactly T = N + 1 distinct Binomial distributions, we give in Section 11 an efficient parallel 
algorithm that on input vectors p and w, outputs the vector a in 0(N log 2 N ) work and 0(log c N) 
depth for some constant c E N+. 

Theorem 2.3 (Canonical Instances Admit Exact Recovery). Suppose p E (0, l ) w+1 is a vector 
such that 0 < pi A pj < 1 for every i j and w E (0, l ) w+1 is a discretized p.d.f. that is induced 
by MDBD £bv,iv+i( a >p) that satisfies w(i ) = otj ■ B^^fipj) for every i E [0 : N ]. Then 

there is a parallel algorithm that on input the vectors p and w, outputs the vector a E (0, l)^ 4-1 in 
0(N log 2 N) work and 0(log c n ) depth, for some constant c E N+. 

2.2 Spectral Sparsification of Matrix Polynomials induced by MDBD 

A matrix B is T-matrix if it is either Laplacian or SDDM matrix. To highlight that an algorithm A 
preserves the matrix type, we write that the algorithm A on input a T-matrix B outputs a matrix 
B' that is also T-matrix. 

Moreover, we say that a matrix A is a spectral sparsifier of a matrix Y if it satisfies (1 — e)Y A 
X A (1 + e)Y, for short X ~ £ Y, where the partial relation X Y 0 stands for X is symmetric 
positive semi-definite (SPSD) matrix. 

We denote by nnz(A) or tua the number of non-zero entries of matrix A. When we write 
ll B = D — M is T-matrix” we assume that D is positive diagonal matrix and B E R nxri . All 
algorithms presented in this paper output spectral sparsifiers with high probability. 

Sequential Algorithms Cheng et al. [5, Theorem 1.5] gave an algorithm that on input a 
Laplacian matrix L = D — A, even integer N E N+ and parameter e > 0, outputs in time 
0(s~ 2 mL log 3 n ■ log 2 A) a spectral sparsifier D — A ~ £ D — D(D~ 1 A) N of a matrix-monomial 
such that nnz(A) 0(e~ 2 n log n). In Section 4, we give for N = 2 k and k E N+ a 0(log 2 N)- 
factor faster algorithm that computes a spectral sparsifier of T-matrix monomials. Furthermore, 
for any T-matrix D — M such that M is SPSD matrix, we prove that the initial sparsification step 
dominates the algorithm’s runtime. 

Theorem 2.4 (Power Method for Monomials). There is an algorithm PwrSS that on input T- 
matrix B = D — M, N = 2 k for k E N + and e E (0,1), outputs a spectral sparsifier D — Mjv 
D — D(D~ 1 AI) n that is T-matrix with nnz(M^) ^ 0{e~ 2 n log n). The algorithm runs in time 

j 0(rriB log 2 n + e -4 • n log 4 n • log 5 N ) , if M is SPSD matrix ; 

1 0{e~ 2 mB log 3 n + e ” 4 • n log 4 n ■ log 5 N) , otherwise. 

Using Theorem 2.4, we give in Section 5 an algorithm that runs by @(A 2 )-factor faster than [4, 
Theorem 2] and computes a spectral sparsifier of a single Binomial T-matrix polynomials of the 
form D-D L 0 B N,i(p ) ■ (T _ 1 M)* = D — DW* , where W p = (1 - p)I + pD^M and p E (0,1). 
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Theorem 2.5 (Single Binomial Matrix Polynomials). There is an algorithm LazySS that on input 
T-matrix B = D — M, number N = 2 k for k E N+, and parameters e,p E (0,1), outputs a spectral 
sparsifier 

N 

D - M p , N n £ D- DW p =D-DJ2 B N ,i(p) ' (D-'My 

i =0 

that is T-matrix with at most 0(e~ 2 n log n) non-zero entries. The algorithm LazySS runs in time 

f 0(m,B log 2 n + e -4 • nlog 4 n • log 5 N) , if p £ ( 0,1/2]; 

1 0(e _2 ms log 3 n ■ log 2 N + £~ 4 • nlog 4 n • log 5 N) , otherwise. 

In Section 6 , we give our main sequential algorithm that builds upon Theorem 2.5 and computes 
a spectral sparsifier of T-matrix polynomials induced by MDBD. 

Theorem 2.6 (Mixture of Binomial Matrix Polynomials). There is an algorithm SS_MDBD that 
on input T -matrix B = D — M, integer N = 2 k , k £ N+, MDBD Bn } t(oi,p) with <5 = 1 — YlJ=i a i 
and parameter e £ (0,1), outputs a spectral sparsifier D — M pz £ D — D {Ti/W ~ ^]) ■ ( D~ 1 M) i 
that is T-matrix with 0{e~ 2 n log n) non-zero entries, where 7 is a discretized p.d.f. that satisfies 
7 i = J2f=i a j ‘ BN,i(Pj) f or all i- The algorithm runs in time 

j 0(mB log 2 n + £ —4 • nT ■ log 4 n ■ log 5 N) , if M is SPSD matrix ; 

1 0{e~ 2 mB log 3 n + £ -4 • nT ■ log 4 n • log 5 N ) ; otherwise. 

We motivate now the first conclusion of Theorem 2.6. When B = D — DW P is a Laplacian 
matrix (of a graph with self-loops) associated with a Markov Chain with transition matrix W p that 
corresponds to p-lazy random walk process, it holds for p £ (0,1/2] (c.f. Lemma 5.1) that DW p is 
SPSD matrix. 

Given MDBD Bn,t{(^,p) that induces a vector 7 , the algorithm in [4, Theorem 2] outputs a 
spectral sparsifier of the corresponding T-matrix polynomial in time 0(E~ 2 mN 2 + NT ) 6 , where 
the term 0{NT ) accounts for computing the vector 7 . In comparison, our improved algorithm 
SS_MDBD runs in time 0{e~ 2 m + e~^nT) for any MDBD with N = 2 k for k E N+. 

Parallel Algorithms Building upon the seminal works of Spielman and Teng [25], Orecchia and 
Vishnoi [20] and Spielman and Peng [ ], we prove in Section 7 that algorithm SS_MDBD can be 

efficiently parallelized. To the best of our knowledge, this is the first efficient and parallel algorithm 
that sparsifies T-matrix polynomials induced by MDBD with N = 2 k for k € N+. 

Theorem 2.7 (Efficient Parallel Spectral Sparsification of Matrix Polynomial induced by MDBD). 
There is a parallel algorithm pSS_MDBD that on input as in Theorem 2.6, outputs a spectral spar¬ 
sifier D — M D — D X^o(t/[1 — 5]) • (T _ 1 M)* that is T-matrix with nnz(M) ^ 0(s^ 2 nlog c n) 
for some constant c, where 7 is a discretized p.d.f. such that 7 i = YlJ=i a j ' T>N,i(Pj ) for all i. 
The algorithm runs in work 0(e~ 2 mB log cl+1 n ■ log 2 N + £ -4 • nT ■ log c+cl+1 n ■ log 5 N ) and depth 
0( log C2 n ■ log N + log T) for constants ci, C 2 € N. 

By combining Theorem 2.2 and Theorem 2.7, we develop an efficient parallel algorithm that 
outputs a spectral sparsifier of a T-matrix polynomial whose coefficients approximate component¬ 
wise a target discretized p.d.f. w. 

6 Of) notation hides poly(log n, log N) factors. 
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Corollary 2.8 (Approximating Target Transition Matrices). There is a parallel algorithm that 
takes as input a continuous p.d.f. w satisfying the conditions of Theorem 2.2, T-matrix B = 
D — M and parameters e,£j € (0,1)? o,nd it outputs in 0(e~ 2 mB + e~^nNsj<j> w /ei) work and 
0{ log C2 n • log N + log(Ny/<f w /si)) depth a spectral sparsifier D — M « e D — D )T^ 0 (7?;/[1 — <5^]) ■ 
{D^Mf that is T-matrix with nnz(M) ^ 0(e~ 2 nlog c n) such that for all i € [3 : N — 3] it holds 
7i/[l ~ € [(1 + 2 rji) • w(i/N) ± 2£ I /S N+1 , N \. 

2.3 Analyzing the Long Term Behaviour of Markov Chains 

For many finite Markov chains [1, 10, 14, 23] there exists N' G N + such that for every N ^ N' 
certain phenomenon occurs with high probability - (local) mixing time, truncated PageRank, etc. 
Therefore, to analyze the long term behaviour of a finite Markov chain, it suffices to select the 
smallest N = 2 k for k G N+ that is larger or equal to N 1 . 

Corollary 2.9 (Capturing The Long Term Behaviour of Markov Chains). Suppose L = D — A is 
dense Laplacian matrix with mL = 0(n 2 ), e G (0,1), Bn,t{&,p) is MDBD such that a i = 1? 
the degree N = 2 k 0(n ) for k G N + and the number of Binomials T ^ 0(n). Then algorithm 
pSS_MDBD outputs in work 0 (e 4 • tul ■ log c+Cl+6 n) and depth 0( log C2+1 n) a spectral sparsifier 
D — A~ e D — D'^2^ =0 'yi(D~ 1 Ay that is Laplacian matrix with 0(£~ 2 n log C2 n ) non-zero entries for 
some constants c,c\,C 2 G N and 7 is a probability distribution induced by the MDBD 

Multiplicative Approximation of Generalized Escaping Probability Consider a Markov 
chain with transition matrix Q 1 = 7 that corresponds to a generalized random walk 

process of length N. Perform a random walk of length N induced by Q 7 that starts at vertex 
v € V. Then for any subset S C V, the corresponding generalized escaping probability is defined 
by gEsc(u, S, Q y ) = ljGylg, where we denote by It the characteristic vector of a subset T <ZV. 

We define the volume of S by p(S) = YIugs ^ u an< ^ ^ a probability distribution over V 

defined by irs(u) = d u /p(S) if u € S and ns{u) = 0 otherwise. The expected generalized escaping 
probability (E.G.E.P.) with respect to its is defined by 

[gEsc(u, S, Gy)} = ngGylg. (2) 

We show in Appendix A that a spectral sparsifier of a random walk Laplacian matrix polynomial, 
yields a multiplicative approximation of E.G.E.P. for all subsets S C V. 

Lemma 2.10 (Multiplicative Approximation of E.G.E.P.). For any spectral sparsifier D — An ~ £ 
D — Dj2iLoTi(dd~ 1 Ay of a random walk Laplacian matrix polynomial such that 7 is a probability 
distribution over [0 : N], it holds for every subset S C V that 

(s(D-A n )£ s G [(l±e) ■E„^ s [gEsc(u,S , ,^ 7 )]], where Cs = 1 s/Vl(S). 

Using Corollary 2.9 and Lemma 2.10, we propose the first efficient and parallel algorithm that 
runs in nearly linear work and poly-logarithmic depth that yields a multiplicative approximation 
of E.G.E.P. for Markov chains with transition matrices induced by MDBD. 

2.4 Faster SDDM Solver 

Spielman and Peng [21] gave the first parallel SDD solver that constructs in 0{m log C1 n-log 3 k) work 
and 0( log C2 n ■ log/v) depth a sparse 0(l)-approxinrate inverse chain that solves approximately to 
any e > 0 precision an SDD system in 0((m+n log c n-log 3 k) log 1/e) work and 0(log n-log n-log 1 /s) 
depth. In Section 8, we give a simple parallel preprocessing step that strengthens their algorithm. 
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Theorem 2.11. There is an algorithm that on input an n-dimensional SDDM matrix M with m 
non-zeros and condition number at most n, produces with probability at least 1/2 a sparse 0(1)- 
approximate inverse chain that can be used to solve any linear equation in M to any precision e > 0 
in 0(nlog c n • log 3 k ■ logl/e) work and 0(logn • logK • logl/e) depth, for some constant c. The 
algorithm runs in 0(m log C1 n) work and 0( log C2 n ■ log ft) depth for some other constants ci,C 2 - 

For the current state-of-the-art result on parallel SDD solvers we refer the reader to the work 
of Lee et al. [ 7]. 

3 Algorithmic Background on Spectral Sparsification 

We write X ~ £1 © £2 Y to indicate (1 — £i)(l — £ 2 )Y Yi X A (1 + £i)(l + £ 2 )Y. Our analysis uses 
the following five basic facts (c.f. [2, 26]). 

Fact 3.1. For positive semi-definite (PSD) matrices X,Y,W and Z it holds 

a. ifY ££ e Z then X + Y » e X + Z; 

b. if X m e Y and W » e Z then X+ W & e Y + Z; 

c. if X ~ £1 Y and Y « £2 Z then X « £1 © £2 Z; 

d. if X and Y are invertible matrices such that X « e Y then X~ l « 2 e Y~ 1 , Ve € (0, |); 

e. for any matrix V if X « e Y then V T XV ~ e V T YV. 

3.1 Prior Algorithms 

Our algorithms for computing spectral sparsifiers of matrix-polynomials use as a black-box several 
spectral sparsification algorithms for Laplacian and SDDM matrices. 

More precisely, our sequential algorithms build upon Theorem 3.2 that relies on Kelner and 
Levin’s [13, Theorem 3] and Cohen et al.’s [6, Lemma 4], and Theorem 3.3 proposed by Spielman 
and Peng [21, Corollary 6.4], 

Theorem 3.2. [13] There is an algorithm SS takes as input parameter £ € (0,1), matrices D and 
A such that D is positive diagonal and A is symmetric non-negative with An = 0 for all i such that 
L = D — A is Laplacian matrix. Then in 0(mL log 2 n) time outputs a positive diagonal matrix D 
and symmetric non-negative matrix A such that nnz(A) ^ 0(e _2 n log n), An = 0 for all i, and 
D — A ~ £ D — A. Moreover, D — A is Laplacian matrix. 

Theorem 3.3. [21] There is an algorithm PS that takes as input SDDM matrix B = D — M 
and parameter e € (0,1). Then in 0(e~ 2 fns log 2 n) time outputs a positive diagonal matrix D and 
symmetric non-negative matrix M with nnz(AL) ^ 0(e _2 ms logn) and Ain. = 0 for all i, such that 
D — M « e D — MD~ l M and D w £ D. Moreover, D — M is SDDM matrix. 

Our parallel algorithm uses Theorem 3.4, which is the culmination of a research line conducted 
by Spielman and Teng [25], Orecchia and Vishnoi [20] and Spielman and Peng [21]. 

Theorem 3.4. [21] There is an algorithm STOVP takes as input a Laplacian matrix D — M and 
parameter e € (0,1/2). Then it outputs a spectral sparsifier D — AI ~ £ D — M with Mn = 0 for all i 
and nnz(AI) ^ 0(e _2 nlog c n) for some constant c. Moreover, this algorithm requires 0(m log Cl n) 
work and 0( log C2 n) depth, for some other constants a and C 2 - 

Based on Theorem 3.4 Spielman and Peng [21] parallelized algorithm PS (c.f. Theorem 3.3). 
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Theorem 3.5. [21] There is a parallel algorithm that on input an SDDM matrix D — M and 
parameter e G (0,1/2), outputs a spectral sparsifier D — M ~ £ D — MD~ 1 AI with D ~ £ D, M. u = 0 
for all i and nnz(M) ^ 0(e~ 2 n\og c n) for some constant c. Moreover, this algorithm requires 
0(e~ 2 m log Cl+1 n) work and 0( log C2 n) depth, for some other constants c\ and c 2 - 

3.2 Spectral Sparsification of T-Matrices 

We show that the algorithms SS and PS can be amended to produce T-matrix sparsifiers that are 
in normalized form, i.e. the sparsifiers are expressed in terms of the diagonal matrix D minus a 
symmetric non-negative matrix M. Our analysis relies on several results established by Peng et 
al. [3, 4, 21, 22], 

Lemma 3.6. There is an algorithm mSS that takes as input a positive diagonal matrix D, sym¬ 
metric non-negative matrix A ( possibly An T 0) such that B = D — A is Laplacian matrix and 
parameter e G (0,1). Then it outputs in 0(m,B log 2 n) time a spectral sparsifier D — A ~ £ D — A that 
is Laplacian matrix and satisfies A is symmetric non-negative matrix with nnz(A) ^ 0(e~ 2 n log n). 

The next result implicitly appears in [ ]. For completeness we prove it in Appendix B. 

Lemma 3.7. Suppose D — A is Laplacian matrix ( possibly An T 0) and D — A a sparsifier with 
An = 0 for every i such that (1 — e)(D — A) A D — A A (1 + s)(D — A). Then the symmetric 
non-negative matrix A = (D — j^D) + satisfies (1 — 2 e){D — A)PD — Afi.{ 1 + 2 e){D — A). 

We present now the proof of Lemma 3.6. 

Proof of Lemma 3.6. Notice that D — A = D' — A', where D' is positive diagonal matrix and 
A! is symmetric non-negative matrix such that A!-- = 0 for all i. By Theorem 3.2 we obtain 
a sparsifier D' — A' ~ £ /2 D' — A'. Then by Lemma 3.7 we have D' — A' ~ £ D' — A' , where 
A' = ( D' — yq—44 / ) + jz^A 1 is symmetric non-negative matrix. We define by Da = D-D' a 
non-negative diagonal matrix. Set A = Da + A! and observe that it is symmetric and non-negative 
matrix. Now the statement follows since D — A = D' — A'. ■ 

T-Matrices Building upon the work of Spielman and Peng [21, Proposition 5.6] and Cheng et 
al. [1, Proposition 25], we prove in Appendix B the following statement. 

Lemma 3.8 (Closure). Suppose D — M is T-matrix. Then D — D(D~ 1 M) N is T-matrix for every 
N G N_)_. Moreover, if D — M ~ £ D — M is a spectral sparsifier, then D — D(D~ 1 M) N is T -matrix 
for every N G N + . 

Normalized Algorithms We present now two algorithms that sparsify matrices of the form 
D — D(D~ 1 AI) n for N G {1, 2} such that the resulting sparsifiers are in normalized form. 

Lemma 3.9 (Normalized Spectral Sparsification). There is an algorithm mKLC that takes as 
input T-matrix B = D — M and parameter e G (0,1), then it outputs in 0{mB log 2 n) time a 
spectral sparsifier D — M ~ £ D — M that is T-matrix and M is symmetric non-negative matrix 
with nnz(M) ^ 0(e~ 2 n log n). 

Proof. By definition B = D\ + L where D\ is non-negative diagonal matrix and L = D 2 — M is 
Laplacian matrix. We obtain by Lemma 3.6 a sparsifier D 2 — M « £ D 2 — M that is Laplacian 
matrix. Now we consider two cases. IfTT = 0 then we are done. Otherwise D± is PSD matrix 
and by Fact 3.1. a we have D — M ~ £ D — M. Since D — M is SDDM matrix and the operator « £ 
preserves the kernel space, it follows that D — M is SDDM matrix. ■ 


We proceed by stating an interesting structural result that implicitly appears in [21] (c.f. Section 
“Efficient Parallel Construction”). For completeness we prove it in Appendix B.l. 

Lemma 3.10. Suppose B = D — M is T -matrix. Let rji = Mj. — Mj^ • 1* be a column vector, 
di = 1) and Si = di — Mu numbers, and = ( Si/di) ■ diag(W) positive diagonal matrix 

for all i, where Ni = {My | M t j / 0}. Let = (Mu jd % + Mjj/dj) ■ M,\j be the (i,j)th entry of a 
matrix with same dimensions as matrix M and D b = diag(B -1) be a diagonal matrix. 

Then it holds that D — MD~ 1 M = Di + L# + where Di = diag([T — MD~ 1 M]l) is 

non-negative diagonal matrix, L b = Dg-B is Laplacian matrix with at most ms non-zero entries 
and every LjV; = {si/df) Dat. — ryrpfldi is Laplacian matrix corresponding to a clique with positively 
weighted edges that is induced by the neighbour set Ni. 

Spielman and Peng [ ] gave algorithm PS (c.f. Theorem 3.3) for sparsifying matrices of the 

form D — where D — M is SDDM matrix. We extend their result to T-matrices and our 

algorithm outputs a spectral sparsifier in normalized form. 

Lemma 3.11 (Normalized 2-Hops Spectral Sparsification). There is an algorithm mPS that on 
input a T -matrix B = D — M and parameter e € (0,1), outputs in 0(e~ 2 mB log 3 n) time a spectral 
sparsifier D — M ~ £ D — MD _1 M that is "T-matrix and M is symmetric non-negative matrix with 
nnz(M) ^ 0(e~ 2 n log n). 

Proof. By Lemma 3.6 we have D — MD~ X M = Di + L, where Di is non-negative diagonal matrix 
and L is sum of Laplacian matrices. Using similar arguments as in “Section 6 Efficient Parallel 
Construction” [21] we find a sparsifier D — M ~ £ /2 L. Moreover, we can compute the positive 
diagonal matrix D' = diag(L) in (Dims) time (c.f. Appendix B.l), and then by Lemma 3.7 we 
obtain a sparsifier D' — M ~ e L. Since Di is PSD matrix the statement follows by Fact 3.1. a. ■ 

4 Core Iterative Algorithm 

Our goal now is to prove Theorem 2.4. We argue in a similar manner as in [ ], but in contrast 
our analysis shows that the initial sparsification step tolerates higher approximation error. This 
observation yields an improved algorithm whose runtime is faster by a 0(log 2 fV)-factor. 

Moreover, we prove that for any 7~-matrix D — M such that M is SPSD matrix, one can 
construct a spectral sparsifier D — M 2 ~ D — MD~ 1 M by first computing D — M ~ D — M and 
then D — M 2 ~ D — MD~ 1 M. This demonstrates that when M is SPSD matrix, the runtime is 
dominated by the initial sparsification. 

The rest of this section is organized as follows. In Subsection 4.1 we describe the initial phase 
of algorithm PwrSS. Then in Subsection 4.2, we present the iterative construction of the desired 
spectral sparsifier D — Mn ~ £ D — D{D~ 1 AI) N . 

4.1 Initialization 

We begin by extending [5, Lemma 4.3 and 4.4], For completeness, we provide a prove in Appendix E 
where in addition we generalize [5, Fact 4.2], 

Lemma 4.1. Suppose B = D — M is T-matrix and D — M ~ £ D — M is a spectral sparsifier. If 
M is SPSD matrix then it holds that D — « e D — MD _1 M. 

Based on Lemma 4.1, we give a faster sparsification algorithm for T-matrices D — MD _1 M 
such that M is SPSD matrix. 
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Lemma 4.2. There is an algorithm fSS that takes as input T-matrix B = D — M such that M 
is SPSD matrix, and parameter e € (0,1). Then it outputs in 0(rriB log 2 n + £~ 4 nlog 4 n) time a 
spectral sparsifier D — M 2 D — MD~ l M that is T -matrix with rmz(M 2 ) ^ 0{e~ 2 n\ogn). 

Proof of Lemma f.2. We apply Lemma 3.9 to obtain a sparsifier D—M D—M in 0(ms log 2 n) 
time with nnz{M ) ^ 0[e~ 2 n log n) such that D — M is T-matrix. Then by Lemma 4.1 we know 
that D — MD~ X M « £ / 4 D — MD~ X M. Now, by Lemma 3.8 D — MD~ l M is T-matrix. Then we 
apply Lemma 3.11 to obtain in 0(e~ 4 n log 4 n) time a sparsifier D — M 2 ~ e /4 D — MD~ 1 M with 
nnz(M) ^ 0(e~ 2 n log n) such that D — M 2 is T-matrix. The claims follows by Fact 3.1.c. ■ 

4.2 Iterative Construction 

Our analysis of the incurred approximation error after O (log IV) consecutive square sparsihcation 
operations builds upon [5, Lemma 4.1]. In contrast, we prove that for the initial and the final 
sparsifiers it suffices to have only an e approximation, while all intermediate spectral sparsihers 
require finer s' = fi(e/logIV) approximation. Due to this higher initial error tolerance, we improve 
the runtime of their algorithm by a 0(log 2 lV)-factor. 

Lemma 4.3 (Accumulative Error). Let D — M and D — M 2 be T -matrices such that D — M 2 ~ £ 
D — MD~ X M and nnz{M 2 ) ^ 0(e~ 2 n log n). There is an algorithm IndSS that on input T- 
matrix D — M 2 , integer N = 2 k for k £ N+ and parameter 0 < e' ^ e, outputs in time 
0(e /_4 n log 4 n ■ log N) a symmetric non-negative matrix ATy with nnz(Mj^) ^ 0{e~ 2 n log n) such 
that D — Mjv ~( 0 (io g jv-i) £ /w 2 £ D — D{D~ l M) N is T-matrix. 

Our goal now is to prove Lemma 4.3. We establish next a useful algebraic property that all 
matrices of the form D(D~ 1 M ) 2 have in common. 

Lemma 4.4. If M is symmetric matrix, then D(D~ 1 M) 2k is SPSD matrix for every k € N+. 

Proof. Let Y = D~ X I 2 MD~ X I 2 . Notice that D{D~ l M) 2k = D 1 / 2 y 2 k D 1 / 2 = X T X, where X = 
y 2 k 1 j-) 1/2 statement follows since X T X is SPSD matrix. ■ 

We present now the main iterative procedure used in algorithm IndSS. 

Lemma 4.5 (Iterative Procedure). Let D—M andD—M 2 k be T-matrices such that D—M 2 k D— 
D(D~ l M ) 2 , for k € N+. There is an algorithm SqrSS that takes as input the "T-matrix D — M 2 k 
and parameter e' € (0,1), then it outputs in 0(E'~ 2 nnz(M 2 k) log 3 n) time a symmetric non-negative 
matrix M 2 k+ 1 with nnz(M 2 k+ 1 ) ^ 0(e /_2 n logn) such that D — M 2 k+i ~ £ © £ ' D — D(D~ l M ) 2k+1 is 
T -matrix. 

Proof. By Lemma 4.4, D{D~ 1 M) 2k is SPSD matrix for any k € N+. By Lemma 3.8 both D — 
D(D~ l M) 2 and D — M 2 kD~ 1 M 2 k are T-matrices. Hence, by Lemma 4.1 we have that D — 
M 2 kD~ 1 M 2 k ~ £ D — D(D~ 1 M) 2k+ ^ . Now by Lemma 3.11 we have D — M 2 k +1 « e / D — M 2 kD~ 1 M 2 k 
and hence the statement follows by Fact 3.1.c. ■ 

Based on the preceding results we are ready to prove Lemma 4.3. 

Proof of Lemma 4-3. By Theorem 3.3 in time 0((e'-£)~ 2 n log 4 n ) we can compute a spectral sparsi¬ 
fier D—M 4 ~ e © £ / D — D{D~ 1 M) a with nnz{MT) ^ 0(e /_2 nlog ?r). Then we apply (log IV—1) times 
Lemma 4.5 to obtain in 0(e'~ 4 n log 4 n ■ log N) time a spectral sparsifier D — Mj\r ~( 0 (io g Jv-i) £ /) 0 £ 
D — D(D~ 1 M) n with nnz(M jy) ^ 0(e /_ 2 nlogn). The statement follows by applying Theorem 3.2 
with e to compute a refined spectral sparsifier of D — M^. ■ 
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Proof of Theorem 2.4 In the initial phase we compute a sparsifier D — M 2 ~ e /4 D — MD 1 M 
with nnz(M 2 ) ^ 0(e _2 nlogn) using either Lemma 4.2 (when M is SPSD matrix) or Lemma 3.11. 
The statement follows by applying Lemma 4.3 with e' = P(e/log N) to the sparsifier D — M 2 . 

5 Spectral Sparsification of Binomial T-Matrix Polynomials 

In this section, we prove Theorem 2.5. We analyze first the properties of matrices of the form 
W p = (1 — p)I -\-pD~ 1 M for p € (0,1). It is convenient to associate with them matrix-polynomials 
f p (x ) = (1 — p) + px such that f p (D~ 1 M ) = W p . Since the coefficients of the matrix-polynomial 
[f p {x)] N follow Binomial distribution B{N,p ), it follows that Wp = Y^i=o^N,iip) ■ ( D~ 1 M )* for 
every p € (0,1), where B Nii (p ) = 1 -p) N ~ l . 

When D — M is a Laplacian matrix, the matrix corresponds to the transition matrix of a 
p-lazy random walk process of length N (c.f. [23]). We associate to such a Markov chain a matrix- 
polynomial D — DWp = D — D J2iLo BN,i{p) ■ ( D~~ 1 M )*. We present now some useful algebraic 
properties of matrices of the form DW 2k and D — DW,jf J . 

Lemma 5.1. Suppose D — M is T-matrix. Then D — DW r p is T -matrix for every N € N+. Also 
DWp is SPSD matrix Vp € (0,1/2] and DW 2k is SPSD matrix Vp £ (0,1) and \/k € N + . 

Proof. By definition of W p , we have D — DW P = p(D — M) is T-matrix. Suppose D — M is 
Laplacian matrix, then D — DW P is Laplacian matrix and by Lemma 3.8, D — DW^ is Laplacian 
matrix for every N £ N+. Suppose now that D — M is SDDM matrix, then D — DW P is SDDM 
matrix and by Lemma B.2 D — DWp is SDDM matrix for every N £ N+. 

By definition DW P = (1 — p)D + pM and since D — M is diagonally dominant, it holds that 
DWp is SPSD matrix for every p £ (0,1/2]. Moreover, since DW p is symmetric matrix by Lemma 
4.4 it holds that D(D _1 • DW p ) 2k = DW 2k is SPSD matrix for every k £ N+. ■ 

Proof of Theorem 2.5. The statement follows by Lemma 5.1 and Theorem 2.4. ■ 

6 Spectral Sparsification of T-Matrix Polynomials Induced by MDBD 

Here we prove Theorem 2.6. Our approach relies on the following key algorithmic idea. 

Lemma 6.1 (Preprocessing of 2-Hop Spectral Sparsification). LetD—M be a T-matrix, D—M\ ~ £ 

D — M and D — M 2 ~ £ D — MD~ l M are spectral sparsifiers. Then for every p £ (0,1) the sparse 
matrix M Pt 2 = (1 — p) 2 D + 2(1 — p)pM\ + p 2 M 2 yields a spectral sparsifier D — M Pi 2 ~ e D — DW 2 
that is T-matrix. 

Proof. The statement follows by 

D — M p 2 = 2p(l — p)[D — Mi] +p 2 [D — M 2 ] 

« £ 2p(l -p)[D- M] +p 2 [D- MD~ l M] = D — DW 2 . 


Our algorithm SS_MDBD builds upon Lemma 6.1 and Lemma 4.3. Due to the preprocessing 
step in Lemma 6.1 we speed up the sparsification of each T-matrix polynomial D — DW p . for all 
j £ [1 : T]. We present now the pseudo code of algorithm SS_MDBD. 
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Algorithm 1 

(D,M) = SS_MDBD(D,M,ejv iT (a,p),e) 

1. Let e^ = eJ[?>\og A], M tmp = 0 and 5 = 1- Yn=i a i- 

2. (D,M i,M 2 ) =InitSS(D,M,e/3). 

3. For every j € {1, ..., T} do 

3.1 Set p = pj t T = j/(T + 1) and M P:2 = (1 — p) 2 D + 2p(l — p)M\ + p 2 M 2 . 

3.2 (D,M P} n) = IndSS(M Pj2 , A, e') where D — M Pi jv ~ 2 e /3 D — DW p (c.f. Lemma 4.3). 

3.3 Mtmp = Mtm P 4“ OLj • M Pt N. 

4. Sparsify D — M ~ e / 3 D — j^Mtmp by algorithm mKLC (c.f. Lemma 3.9). 

5. Return ( D,M ). 


Algorithm 2 

(D, Mi, M 2 ) = InitSS(D,M,e) 

1. Sparsify D — M\ & £ D — M by algorithm mKLC (c.f. Lemma 3.9). 

2 . Sparsify D - M 2 D - MD~ l M 

2.1 If M is SPSD matrix call algorithm fSS (c.f. Lemma 4.2), 

2.2 otherwise call algorithm mPS (c.f. Lemma 3.11). 

3. Return (D, Mi, M 2 ). 


Let 5 = 1 — i a i- We denote a T-matrix polynomial induced by MDBD Bn,t(®,p) as 

P Z3 = 1 a j( D ~ DW p :j ) = (! - $)D - D Y^=o 7 i{D~ 1 M) 1 , where 7 * = Y7 ]= i a jB N ,i(pj). We are 

now ready to prove Theorem 2.6. 

Proof of Theorem 2.6. Let e' = e/[4 log IV]. We perform first a preprocessing step. We apply 
Lemma 3.9 to obtain a sparsifier D — M ~ e / D — M . Then depending on whether M is SPSD 
matrix we use either Lemma 3.11 or Lemma 4.2 to obtain a sparsifier D—M 2 ~ £ ' D—MD~^M. The 
run time is at most 0(e~ 2 mB log 3 n ■ log 2 N ) or 0(rriB log 2 n + e _4 n log 4 n ) respectively. Moreover, 
the sparsifiers satisfy nnz(Mi),nnz(M 2 ) ^ 0(e~ 2 nlogn • log 2 A). 

We combine Lemma 6.1 and Theorem 2.4 to find each sparsifier D — M PjV v D — DW p . by 
initializing algorithm PwrSS with a sparsifier D—M Pjy 2 ~ e / D—DW 2 .. Let M tmp = YlJ=i a j^pj,N, 
then by Fact 3.1.b it holds (1 — 5)D — M tmp Pg. This phase has 0(e _4 nT log 4 n-log 5 N) runtime 
and each sparsifier satisfies nnz(M Pi > jv) ^ 0(e^ 2 n log n). 

However, matrix Mf rnp can be dense. We apply Lemma 3.9 to obtain a sparsifier D — M ~ £ 
D — Mtmp i n ti me 0(min{n 2 log 2 n, £~ 2 nT log 3 n ■ log 2 A}) such that nnz(M) ^ 

0(s~ 2 n log n). ■ 

7 Parallelization of Algorithm SS_MDBD 

In this section we parallelize algorithm SS_MDBD. This gives the first efficient parallel algorithm 
that computes a spectral sparsher of any T-matrix polynomial with coefficients induced by MDBD. 

Our goal now is to prove Theorem 2.7. By construction of algorithms SS_MDBD and InitSS, it 
suffices to show that we can efficiently parallelize algorithms mKLC, mPS and PwrSS. Then the 
statement follows by noting that each T Binomial matrix-polynomial can be sparsified separately 
and in parallel. 
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We parallelize now algorithms mKLC, mPS and IndSS. 

Lemma 7.1. There is a parallel algorithm pKLC that on input T-matrix B = D — M and 
parameter e E (0,1), outputs a spectral sparsifier D — M ~ e D — M that is T-matrix such that M is 
symmetric non-negative matrix with nnz(M) ^ 0(e _ 2 nlog c n) for some constant c. The algorithm 
runs in 0(mB log C1 n) work and 0( log C2 n) depth, for some other constants ci, C 2 . 

Proof. We argue in a similar manner as in Lemma 3.9 to show that the statement holds for T- 
matrices (Laplacian or SDDM matrices). Then the statement follows by Theorem 3.4. ■ 

Lemma 7.2. There is a parallel algorithm pPS that on input T-matrix B = D — M and parameter 
e E (0,1/2), outputs a spectral sparsifier D — M pz e D — MD~ 1 M that is T-matrix such that 
D ~ £ D, Mu = 0 for all i and nnz(M) ^ 0(e~ 2 nlog c n) for some constant c. The algorithm runs 
in 0{e~ 2 m\og cl+1 n) work and 0 (log C 2 n) depth, for some other constants ci and 02 - 

Proof. Using similar arguments as in Lemma 3.11 we prove that the statement holds for T-matrices. 
Then the statement follows by Theorem 3.5. ■ 

Lemma 7.3. Let D — M and D — M 2 k are T-matrices such that D — M 2 k ~ £ D — D{D~ 1 M) 2k , 
for k E N+. There is a parallel algorithm pSqrSS that on input the T-matrix D — M 2 k and 
parameter s' E (0,1), outputs a spectral sparsifier D — M 2 k +1 ~ £ © £ ' D — D(D~ 1 M) 2k+1 that is 
T -matrix with nnz(M 2 k+ 1 ) ^ 0{e! nlog c n) for some constant c. The algorithm runs in work 
0{e'~ 2 nnz{M 2 k) log cl+1 n) and depth 0(log C 2 n), for some other constants ci,C 2 - 

Proof. We use similar arguments as in Lemma 4.5, but we substitute Lemma 3.11 with Lemma 7.2. 

■ 

Lemma 7.4. Let D — M and D — M 2 are T-matrices such that D — M 2 ~ £ D — AiD~ l M and 
nnz(M 2 ) ^ 0(e~ 2 nlog c n) for some constant c. There is a parallel algorithm plndSS that on input 
T -matrix D — M 2 , integer N = 2 k for k E N and parameter 0 < s' ^ e, outputs a spectral sparsifier 
D — Mjv ?a (©(iogjv-i) e /) 02 e D — D(D~ 1 M) n that is T-matrix and uuz{Mn) ^ 0{e~ 2 n log c n). The 

algorithm runs in work 0(e /_4 n log c+cl+1 n) and depth 0(log C2 n ■ log N) for some constants ci,C 2 - 

Proof. We argue in a similar manner as in Lemma 4.3. By Lemma 7.1 we compute a sparsifier 
D — M 4 ~ £ '© £ D — D(D~ 1 M) i with nnz(MT) ^ 0(e ,_ 2 nlog c n) in work 0((s ■ e')~ 2 n log c+Cl+1 n ) 
and depth 0( log C2 n). Then we apply (log N — 1) times Lemma 7.3 to obtain a spectral sparsifier 
D — M n ~(®(io g iv-i) £ /) 0 e D — D(D~ 1 M) n with nnz^Mx) ^ 0{e'~ 2 n\og c n) in 0(e ,_4 n log c+Cl+1 n) 
work and 0( log C2 Tog N) depth. The statement follows by applying Lemma 7.1 with e to compute 
a refined spectral sparsifier of D — M^. ■ 

We present now the proof of Theorem 2.7 which yields the parallel algorithm pSS_MDBD. 

Proof of Theorem 2.7. We sketch first our parallel algorithm pSS_MDBD. We parallelize algo¬ 
rithm InitSS based on algorithms pKLC and pPS. Then, we sparsify separately and in parallel 
each of the T distinct single Binomial T-matrix polynomials by algorithm plndSS. The resulting 
T sparsifiers are scaled and merged into a T-matrix polynomial induced by MDBD. Since this 
matrix-polynomial might be dense, we sparsify it using algorithm pKLC. 

The correctness of algorithm pSS_MDBD follows by Theorem 2.6. We analyze now the work 
and the depth of algorithm pSS_MDBD. By Lemma 7.1 and Lemma 7.2 the initial phase is dom¬ 
inated by 0{e~ 2 m log Cl+1 n ■ log 2 N) work and 0(log C2 n) depth. Moreover, each of the sparsifiers 
D — M « e D — M and D — M 2 D — MD~ l M has at most 0(s~ 2 nlog c n) non-zero entries. 
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Let e' = e/[121ogA r ], By Lemma 7.4 for each j £ [1 : T] we apply algorithm plndSS to 
compute a spectral sparsifier D — M Pj) n ~ e /6 D — DW p . with nnz(M Pj ^ 0{e^ 2 n log c n). This 
phase runs in work 0(e ,_4 nTlog c+cl ~ , ~ 1 n) and depth 0(log C2 n ■ log N). Furthermore, the linear 
combination M tmp = Ysj=i a j ' Mp,N (in Step 3.3) can be computed in depth O(logT). 

Therefore, we approximate by D — ~ e /6 D — D J2iLo the desired T- 

matrix polynomial induced by MDBD. Since matrix Mtmp might be dense, by Lemma 7.1 we 
compute a spectral sparsifier D—M « e / 12 D—Mt, np with nnz(M) ^ 0(e~ 2 n log C2 n). Algorithm 
pKLC runs in work 0(min{e 2 nTlog c n, n 2 } ■ log cl n) and depth 0( log C2 ri). ■ 

8 Faster SDDM Solver 

In this section we prove Theorem 2.11. We argue in a similar manner as in [21], but in contrast our 
improved analysis relies on the refined initialization phase developed in Section 4 and its consecutive 
parallelization in Section 7. Spielman and Peng’s [ ] proof involves two major steps: the first is 

to construct a sparse 0(l)-approximate inverse chain, and the second is to apply this chain as a 
preconditioner into an algorithm known as “Preconditioned Richardson Iteration” [21, Lemma 4.4]. 

We give now an improved construction for a sparse e-approximate inverse chain. This directly 
implies the desired statement of Theorem 2.11. 

Proof of Theorem 2.11. We apply Lemma 7.1 to compute a sparsifier D — M ~ £ / 16 D — M = B 
with nnz(M) ^ 0(e _2 nlog c n) in work 0 (tob log Cl n) and depth 0(log C2 n). Then by Fact 3.1.d 
we have ( D — M)~ l (D — M) -1 . 

Our goal now is to construct a sparse e-approximate inverse chain of the sparsifier B = D — M. 
The condition number of matrix B satisfies Kg ^ ^-e/s KB = ^R' £ ' = e /(16 log tg). Spielman 

and Peng [ ] proved that O(logfg) iterations suffice for the following iterative procedure to output 

a sparse e-approximate inverse chain. 

By Lemma 7.2 we compute a spectral sparsifier D — M 2 ~ e /8 D — MD~ X M with nnz(M 2 ) ^ 
0(s~ 2 n log c n) in work 0(e _4 nlog c_l_cl+1 n) and depth 0( log C2 n ). By Lemma 3.8 and Lemma 7.3 
for each consecutive call we compute a sparsifier D — M 2 k +1 ~ £ / D — M 2 kD~ 1 M 2 k with the at most 
0(e _2 n log c n ■ log 2 kb) non-zero entries in work 0(e _4 n log c+Cl+1 n ■ log 2 kb) and depth 0(log C2 n). 
We combine now Fact 3.1 and apply recursively O(logfg) times the relation 

[D~ l + (/ + D^M^ID - M 2k +i)- l {I + M 2 kD~ l )\/2 
w e / [D- 1 + (/ + L* -1 M 2 fc)[D - M 2 kD~ 1 M 2 k]~ 1 (I + M 2 kD~ 1 )]/2 
= (D — M 2 fc) _1 . 

Spielman and Peng showed in [21, Corollary 5.5] that D — M 2 kD~ l M 2 k can be replaced with D for 
k = O(logfg) and maintain the desired approximation. The statement follows by Fact 3.1. ■ 


9 Representational Power of MDBD 

In this section we prove Theorem 2.1. Our analysis relies on the following three influential works. 
Hald [12] analyzed mixed Binomial distributions in continuous case. Cruz-Uribe and Neugebauer [7, 
] gave sharp guarantees for approximating integrals using the Trapezoid method. Doha et al. [9] 
proved a simple closed formula for higher order derivatives of Bernstein basis. 


14 



Our goal now is to prove Theorem Theorem 2.1. Hald [12] proved the following result on mixed 
Binomial distributions. 


Theorem 9.1. [12] Let w(x ) be a probability density function that is four times differentiable. 
Then for every N € N and i € [0,1V] the Bernstein basis Bpj^fp) satisfies 


rl f ^ D w(i/N) 

w(p) ■ B N:i (p)dp = ——— 


IQ 


bi(i/N) b 2 (i/N) f 1 \1 

N + N 2 \N 3 J_ 


(3) 


where the functions are defined by b±(x) = ^x)[~ w ( x ) + (1 — 2 x)w'(x) + \x(l — x)vj"(x)\ and 
b 2 (x) = -^hj[w{x)—3{l—2x)w'{x)+(l—6x+6x 2 )w''(x)+^:x{l—x){l—2x)w"'(x)+\x 2 {l—x) 2 w' v {x)\. 

We distinguish two types of approximation errors. The error term (1 + ry) (c.f. Equation 1 in 
Theorem 2.1) is caused by the error introduced in Equation 3. The second error type is due to 
the integral discretization with finite summation. The later approximation error in analyzed by 
Cruz-Uribe and Neugebauer [7, 8]. We summarize below their result. 

Theorem 9.2. [7, 2] Suppose f be continuous and twice differentiable function, T £ N is number, 
and the discrete approximator of f is defined by AT(f) = [|(/(0) + /(l)) + f{^/T)]/T. Then 

the approximation error is given by the expression Ex(f) = \Ax{f) — J 0 f(t)dt\ = | ^T =1 Li\, where 
L i = \ lx!-! [w 1 ~ (* “ c *) 2 ] f"( t ) dt and °i = ( x i -1 + x i )/ 2 - 

The rest of this section is devoted to prove Theorem 2.1. We use the following two results 
established by Cruz-Uribe and Neugebauer, and Doha et al. 

Lemma 9.3. [7, 8] The Bernstein basis satisfies / 0 * B^^{x)dx = for every i € [0 : N]. 
Lemma 9.4. [9] The pth derivative of a Bernstein basis satisfies for every i € [0 : N] that 


d p B Nt j(x ) 
dx p 


N\ 

(N — p)\ 


min{z,p} 

E 

fc=max{0,i+p— N} 



EN—p,i—k(xf 


We propose an upper bound on the integral of pth order derivative of Bernstein basis. 


Corollary 9.5. For every p € [1 : N — 1] and i € \p + 1 : N — (jp + 1)] such that i + p ^ N, the 
Bernstein basis satisfies 



d p B Nti (x) 
dx p 1 j 


dt E 


N\ 

(N-p)i 


2 p 

N + l' 


Proof. We combine Lemma 9.3 and Lemma 9.4 to obtain 


d p B N j (x) 


dxP 


(t) 


dt A 


N\ 


(N-p) 


! E (yfcj ’ j 0 B N-p,i-k{t)dt ~ 


N\ 


2 p 


(. N-p)\ N + l' 


We are now ready to prove Theorem 2.1. 


Proof of Theorem 2.1. Recall that Ffix) = w(x)B^^fx). By Theorem 9.2 we have 


T 




1 

2 


rx k 


E 

k=\ Jx k-1 


1 

4 T 2 


(t - c k f 


d 2 Fjfx) 
dx 2 


(:t)dt 



d 2 Fi{x) . . 

dx 2 [) 


dt. 


15 



































Since d ^‘{ x> \ = w" • B^,i + 2 ■ w' • B' N i + w ■ B'f we consider following three cases: 
Case 1: We combine max ie [o,i] \ w "i x )\ ^ 2</> w • N 2 and Lemma 9.3 to obtain 

w"{t) ■ B Nj i(t)\dt ^ 2 (j) w ■ N. 

Case 2: Using max xG [ 0jl ] |u/(x)| ^ ^<f> w ■ N and Corollary 9.5 it holds 




• B N,i{t)\dt < <j> w 


■N. 


Case 3: Combining max xg [ 0i i] I u '( x )l ^ and Corollary 9.5 yields 


r Ht) 

J 0 


B NAt)\dt < 4> w ■ / B'f t (t)dt ^ Ac() w ■ N. 


The desired result follows from the preceding three cases and Theorem 9.1. 


10 Approximating Discretized PDF 

In this section we prove Theorem 2.2. We begin our discussion by presenting the pseudo code of 
algorithm AppDscrPDF. 


Algorithm 3 Approximate Discretized PDF by MDBD 
B N , T (oi,p ) = AppDscrPDF(u;, <f> w , N, ej) 

1. Compute S N+ i tN = Y,2=o w ( i / N ) and s t,t+i = Ya=i w(i/[T + 1]), where T = \N^(j) w /£ I \. 

2. Compute aj = w(pj) ■ N/[(T + 1) • Slv+gA'], where pj = j/[T + 1] for all j G [1 : T], 

3. Return Bn,t{oi,p)- 


Before we prove Theorem 2.2, we analyze the class of continuous probability density functions 
that admit a discretized approximation by MDBD. 

Lemma 10.1. Let w(x) = C ■ f[x) be a twice differentiable p.d.f. such that for N e N + it holds 
a) 0 < f(x) ^ 1, b) i[/(0) + /( 1)] > D(l), c) 1 ^ C ^ o(N), and d) f* |/ (2) (x)| dx ^ o(N). 
Then, for T = |~ Nyj<f w /ei\ with 4> w /ei ^ 1 it holds 


1 - ST 'n +l ^ T ^ = °( 1 ); where St,t+i = ^2 W ( k /[ T + !]) 
h>N+i,N/dy ^ 


k =1 


Proof. By Theorem 9.2 for the discrete approximator of / 


N 

and Sn+i,n = J2w(k/N). 

k =0 


Am U) = m 


M -1 


^(/(o)+ /(!))+ f\ M 


i= 1 


it holds that 


A T+i(f)- ! f(t)dt ^ 1 2 f f (2) {t ) 

Jo 8 (T + 1) Jo (p w \NJ 
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and similarly 


AN{f) ~ l fm ^ 8 W l \ fi2){t) 

By definition, Jq 1 f(t)dt = C~ 1 G (l/o(lV), 1] and thus 


dt ~ ° ( N 


At+i (/) G 


1 -l £l 

— ± -— • o 

c (f> w 


and A N (f) G 


C 


± o 


Let d = [/(0) + /(l)]/2. Straightforward checking shows that 


St,t+i 

T+l 


= C 


Ar+itf) - 


d 


T+l 


, <SW+i,JV 

and -—— = C • 

N 


An ^ + N 


We prove now the upper bound. By assumption d € [fl(l), 1] and since C ^ o(N) we have 


A4 


■S^t+i/^T + 1) _ A T +i(f) - jrpy _ c T+i ^ 4>j, ' 0 (n) 
S n+ + n /N A N (f) + A ^ !+ d_±o(j r ) 


+ 1 - 


i - ^ i _ , 1 , = i - 0 (i). 


h+ [f-°(w)] 

We can prove the lower bound A ^ 1 — o(l) using similar arguments. 
We present now the proof of Theorem 2.2. 


N I + I 

C ^ N 


Proof of Theorem 2.2. By definition St,t+ i = T,k=l w ti/[ T + 1 }), Sn+i,n = T,f=o w U/ N ) and the 
desired discretized p.d.f. is 


w(i/N) 
w(i/N) = - 


w(i/N) 
Y/f=0 W U/ N ) S N+1,N 


By Theorem 2.1, it holds for all i G [3 : N — 3] that 


(1 + H) 'ATH _ Y, 1 "°(. [r t 111 • B^U/lT+ 1]) 


N 


3 = 1 


T + l 


N 


(4) 


We construct now MDBD Bn,t(,cx,p) as follows: for all j G [1 : T] we set 


a.n = 


w(pj ) • iV j 

-, where = 


Sn+i,n • (T + 1 ) 


T + l 


Moreover, Bn,t(&,p) induces a vector 7 that satisfies 7 .j = ]Ty =1 ' BN,i(Pj ) for all z G [0 : IV]. By 

multiplying Equation 4 with IV/Sjv+i.jv we obtain 

|(1 + rji) w(i/N) - 7 i| ^ er/SW+giV- (5) 

Furthermore, since 


E' 

l=i 


AT 


T 


(T + 1 ) • Sn+i,n 




St,t+i/(T + 1) 
‘S’at+i.jv/AI 
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by Lemma 10.1 there is a small positive number 5 W = o(l) such that 


St,t+i/{T + 1) 
Sn+i,n/N 


T 

1 ~ 

3 = 1 


O','. 


Hence, we have 

N N T T N T 

J2 = a 3 ■ B N,iiPi) = a i b nApj) = J2 a J = 1 ~ Sw - 

2 =0 z =0 j=l j=l 2=0 j=l 

Since cL, = o(l), by Equation 5 it follows that y/[l — <5 W ] is a discretized probability distribution 
over [0 : N] that approximates component-wise the desired discretized p.d.f. w. 

We note that the summations Sn+i,n and St,t+ i can be computed in 0(N + T) work and 
0(log N + logT) depth. Hence, the statement follows. ■ 

11 Efficient Parallel Solver for Transpose Bernstein-Vandermonde 
Systems 

Problem 1. Suppose a vector 7 € (0, l ) Ar+1 is induced by a convex combination of exactly N + 1 
discrete Binomial distributions B(pi,N ) such that 0 < pi / pj < 1 for all i 7 ^ j. Find the unique 
vector a € (0, 1 )^ +1 such that 7 \ a j ' B N,i(,Pj) for all i G [0 : IV]. 

The Bernstein basis is a well studied primitive in the literature for polynomial interpolations [7, 
8 ]. It is defined by B Nk fp) = (^)p k { 1 — p) N ~ k for any k £ [0 : N], Let Bn,t((*,p) be MDBD with 
T = N + 1. Then the Bernstein basis matrix is defined by [B N(p)]ji = BN,i(pj ), Vi, j € [1 : IV + 1], 
and it has a full rank (c.f. Appendix C). Moreover, the vector a is the unique solution of the linear 
system Btv(p) t o = 7 . 

In this section, we give an efficient parallel algorithm that solves Problem 1 and works in nearly 
linear work and poly-logarithmic depth. Our goal now is to prove Theorem 2.3. We reduce a 
transpose Bernstein-Vandermonde system to a transpose Vandermonde system that can be solved 
efficiently and in parallel by a variation of an algorithm proposed by Gohberg and Olshevsky [ ]. 

We present now their main algorithmic result. 

Theorem 11.1. [11] There is an algorithm that on input two vectors a,p € M A+1 such that 
0 < Pi 7 ^ Pj < 1 f or oM i 7 ^ 3 1 outputs the vector 7 = V(p) T a in 0(N log 2 N ) time. 

Theorem 11.1 follows by [11, Algorithm 2.1] which relies on a non-trivial matrix decomposition of 
V(p) T to compute in 0(N log 2 N ) time the desired matrix-vector product. It can be easily verified 
that [11, Algorithm 2.1] can be amended to compute the vector a = [V (p) T ] X 7 in 0(N log 2 N) 
time. Furthermore, straightforward checking shows that this modified algorithm can be easily 
parallelized. We summarize below the resulting parallel algorithm. 

Theorem 11.2. [11] There is a parallel algorithm that on input two vectors j,p € as in 

Theorem 11.1, outputs the vector a = [V(p ) T ]^ 1 7 in 0(Nlog 2 N) work and 0(log c n ) depth, for 
some constant c € N+. 

We prove now that the Bernstein basis matrix Bjv(p) admits the following decomposition. 
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Lemma 11.3. Suppose p € (0,1) JV+1 is vector such that 0 < pi pj < 1 for all i j, V(p) 
is Vandermonde matrix defined by [V(p)]ji = D p = diag({(l — and Dqn = 

rj J 

dia s({(T)}*=o) are positive diagonal matrices. Then it holds that Bjv(p) = D p ■ V(p) • Dcn- 
Proof. By definition [D p ■ V(p) • L>c7v]j,i = (1 - Pj) N d) = B N,i{Pj) = [ B N(p)]ji- ■ 

We are ready now to prove Theorem 2.3. 

Proof of Theorem 2.3. By Lemma C.l the Bernstein matrix B n(p) is invertible. Given a vector 
7 € (0,1 ) JV+1 we want to find the vector a = [Bat(p) t ]~ 1 7 . By Lemma 11.3 we have [Bat(p ) t ] _1 = 
[. D p j ^ 1 • [V(p) T ] -1 • [Dcn]~ 1 - Moreover, we can compute a vector 7 ' = [Dcn]^ 1 7 in O(n) time. 
Using Theorem 11.2, we obtain a vector 7 " = [V(p) T ] _ 1 7 / in 0(N log 2 IV) work and 0(log c n) 
depth. The desired vector a = [Dp] -1 7 " takes further 0(n ) work and 0(1) depth to compute. ■ 
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A Generalized Escaping Probability 

Proof of Lemma 2.10. By definition, for every vector x the spectral sparsffier D — A preserves 

approximately the quadratic form 

x t (D — A)x € [(1 ± e) ■ x t (D — DQ^x], 

Hence, the statement follows by applying the identities 

Cs(D — DQ-fj^s = 7rJ(I- £y)ls = 1 - Ts^yl-S = 

= E v ^ s [gEsc(u, S, C? 7 )] • 


B Spectral Sparsification of T-Matrices 

Our proof of Lemma B.2 is based on the Perron-Frobenius Theorem [ ] for non-negative matrices. 

Theorem B.l. [19, Perron-Frobenius] Suppose A is symmetric nonnegative matrix. Then it has 
a nonnegative eigenvalue A which is greater than or equal to the modulus of all other eigenvalues. 
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Lemma B.2. Suppose D — M is SDDM matrix. Then D — D(D 1 M) N is SDDM matrixMN E N+. 

Proof. Since D — M is SDDM we have D >- M. By Fact 3.1.e it holds I >~ D~ 1 / 2 A1D^ 1 ^ 2 = X. 
Hence, the largest eigenvalue A(X) < 1. By Theorem B.l the spectral radius p(X) < 1, i.e. 
|Aj(X)| < 1 for all i. Since X is symmetric it has the form X = JT A iUi,uJ. Moreover, we have 
X k = A iUiuJ for every k E N + . Thus the spectral radius of X k satisfies p(X k ) = p(X) k < 1. 

Notice that B k = D — D(D~ l M) k is symmetric non-negative matrix for every k E N+. By 
definition D — M is diagonally dominant and thus D _1 M1 P 1 component-wise. This implies that 
B k is diagonally dominant matrix. Notice that B k = D 1 / 2 [/ — X k ]D 1//2 and since p(X k ) < 1 it 
follows that B k is positive definite and hence SDDM matrix. ■ 

Proof of Lemma 3.8 We combine Lemma B.2 with the following two statements. 

Fact B.3. Spielman and Peng [21, Proposition 5.6] showed that if D — M is SDDM matrix, then 
D — MD~ l M is SDDM matrix. Also Cheng et al. [4, Proposition 25] showed that if D — M is 
Laplacian matrix then D — D(D~ 1 M) N is Laplacian matrix for every N E N+. 

Based on Lemma 3.9 and Fact B.3 we establish the following result. 

Lemma B.4. Suppose D — M is T-matrix and D — M ~ E D — M is a spectral sparsifier. Then 
D — D(D~ 1 M) n is T-matrix for every N E N + . 

Proof of Lemma 3.7 We use the following result that appears in Peng’s thesis [22]. 

Lemma B.5. [22] Suppose D — A is Laplacian matrix (possibly An ^ 0), and D — A a sparsifier 
with An = 0 for every i such that (1 — e){D — A) P D — A P D — A. Then the symmetric 
non-negative matrix A = (D — D) + A satisfies (1 — e){D — A) P D — A P (D — A). 


Proof of Lemma 3.7. Let D\ = jj^D and A\ = j^A. Then A =|(D — A) P D\ — A\ P D — A 
and by Lemma B.5 the symmetric non-negative matrix A = (D — D\) + A\ satisfies \rf(D — A) P 
D — A P D — A. Since ^ 1 — 2e for every e E (0, the statement follows. ■ 


B.l Structural Result 

Suppose D — Al is 7~-matrix. We show that the matrix D — AID~ 1 M can be expressed as a sum 
of a non-negative main diagonal matrix and a sum of Laplacian matrices. 

Proof of Lemma 3.10. Let M E M nxn and nnz(M) = m. We decompose the entries of matrix 
MD~ X M into three types. We set type 1 to be the entries (MD l M)n = for all i. 

We note that all entries of type 1 can be computed in 0{m ) time. We consider next the off-diagonal 
entries 


(MD~ L M)ij = 


( Mn/di + Mjj/dj ) • Mij 

' -v-' 

type 2 


+ 


E M ik Mj k /D kk 
MPdl 

^ v ^ 
type 3 


Observe that the number of type 2 entries is at most m. Now for a fixed k we note that the 
corresponding entries that appear in type 1 and type 3 form a weighted clique (with self-loops) 
whose adjacency matrix is defined by ■ 

Straightforward checking shows that MD~ l M = B + JT ■ By Lemma 3.8 D — MD~ 1 AL 

is T-matrix and thus diagonally dominant. Hence, the Laplacian matrices L b and L^r for all i 
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exist. Moreover, we can compute in 0(m) time the positive diagonal matrices and DjV; = 
(sj/dj)diag(iVj) for all i. To see this, observe that s* and d{ can be computed in 0{m) time for all 
i, and the number of elements in the disjoint union | Uj JVj| ^ m. ■ 


C Bernstein Basis Matrix 

We prove below that the Bernstein basis matrix in Problem 1 has full rank. 

Lemma C.l. Suppose a vector p £ (0, l) Ar+1 satisfies 0 < pi pj < 1 for all i j. Then the 
Bernstein basis matrix B n{p) has a full rank. 

Proof. Suppose for contradiction that rank(B 7 v(p)) < N + 1. Then there is a vector A € R w+1 such 
that the linear combination of the columns of Bjv(p) satisfies YliLo ^'[Bjv(p)]:,j = 0. Let f\(x) 
be a polynomial defined by f\(x) = YliLo ' ^N,i(x) = YliLo^i ' — x) N ~ l . Notice that 

f\{pj ) = 0 for every j £ [1 : N + 1], i.e. f\(x) has N + 1 roots. However, since the polynomial 
fx(x) has degree N it follows that f\(x) = 0. Therefore, we obtained the desired contradiction. ■ 


D Approximating Two Canonical PDFs 

Here, we illustrate the representational power of MDBD. We show that there are MDBD satisfying 
the hypothesis in Theorem 2.2 and approximate two canonical continuous p.d.f.: the Uniform 
distribution and the Exponential Families. More precisely, we prove that the Uniform distribution 
and the Exponential families admit a multiplicative and an additive approximation, respectively. 


D.l Uniform Distribution 

Lemma D.l (Uniform Distribution). Let w(x) = 1, N € N+ and e > 0. If T ^ P(Ne -1 / 2 ) then 
it holds that 


T + 


j B N)i 


j =i 


T+ 1 


€ 


(1 ± e) 


1 


N + 1 


, for all i £ [3 : N — 3]. 


Proof. By Theorem 9.2 we have that 


T 


I> 

i— 1 




E 


1=1 Jx i~ 1 


m 2 


- (t - cfy 


d 2 B Nji (x) 


dx 2 


(t) 


dt C 


8 N 2 


d 2 B Nti (x) 


dx 2 


(t) 


dt. 


By combining Lemma 9.3 and Corollary 9.5 for every i £ [3 : N — 3] it holds 


1 

N + 1 


1 

T + 1 


T 


y B N>i 

3 = 1 


J 


T + l 




E L - 



d 2 B Ni (x) 


dx 


N 

2T 2 ' 


We note that 


N 

2T 7 




N+l 


since 


T ^ LL(Ne 1//2 ), and hence the statement follows. 


Remark D.2. All conditions of Theorem 2.2 hold. Note that w(x) = 1-1, i.e., f{x) = 1. 
a) C = 1, b) f(x) = 1, c) |[/(0) + /(!)] = 1 and d) \f^(x)\dx = 0, since f (2 \x) = 0. 
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D.2 Exponential Families 

Lemma D.3 (Exponential Families). Let N € N, k £ [1, \/iV] and w(x ) = ' exp {—k ■ x} i 

a probability density function. For any e > 0 if T ^ Ll(N^Jk/e) then for every i € [3 : N — 3] it 
holds for the function Fi{x ) = w{x ) • B^,i{x) that 


is 


(1 + Vi) 


w(i/N) 


1 


N 


T+ 1 


E* 

3 = 1 


J 


T+ 1 


£ , 1 

< —, w/iere |r/i| < 


Proof. The pth derivative of function w(x) satisfies uJ p \x) = {—k) p ■ w(x). Let I = [0,1] be an 
interval. Straightforward checking shows that 


max = k p ■ max |u>(x)| < 2 ■ k p+1 . 

x&i xei 

By the definition of function &i(x) (c.f. Theorem 9.1) we have 


( 6 ) 


1.2 

b\(x) = — — • x 2 + I 2 k + 


k 2 


x (IT k), 


and we can show that max^ e/ b\(x) ^ 1 + k 2 /8 ^ 1 + N/8. The function 62 (®) satisfies 

62(3;) = 1 + 3(1 — 2 x)k + (1 — 6x + 6 x 2 )k 2 — -x(l — x)(l — 2 x)k 3 + -x 2 (l — x) 2 k 4 , 

6 8 

and we can show that rnax^g/ 62 (%) k 4 / 8 ^ N 2 /8. By Theorem 2.1 it suffices to upper bound 

the following four cases. 

Case 1: By Theorem 9.1 p ^ since max l6 / | 6 i(x)| ^ 1 + N/8 and rnax^g/ | 62 (x)| <C N 2 /8. 


Case 2: We combine Equation 6 and Lemma 9.3 to obtain 

r 1 


f 

Jo 


|w (t) ■ B]sr t i(t)\dt <2 ■ Ar 


\B N ,i(t)\dt = 


'0 


2 -k 3 
N + l' 


Case 3: By combining Equation 6 and Lemma 9.5 it holds 


f \w'(t) ■ B' Ni (t)\dt < 2 ■ k 2 ■ f \B' Ni (t)\dt<4:-k 2 . 

Jo ’ Jo 


Case 4: We use again Equation 6 and Lemma 9.5 to obtain 


|u>(t) • B'/fj(t)\dt <2 ■ k ■ / \B'f[ ^{t)\dt <8 ■ k ■ N. 


Recall that F t {x) = w(x)B]y t i(x). By combining | - \ = w" ■ B^,i + 2 w' ■ B' Ni + w ■ B'/j i and 
Theorem 9.2 we have 


S> 

i =1 




8 T 2 


d 2 F{x) 


dx 2 


H) 


dt < 


k ■ N 

rp2 


Hence, the statement follows. 
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Remark D.4. The hypothesis in Theorem 2.2 holds. Note that w(x) = • exp {—k ■ x} and 

4>w = Thus c f( x ) = exp {-k • x} ; / (2) (x) = k 2 ■ f(x) and k € [1 ,VN]. 

Furthermore, 

a) C = o(N), b) 0 ^ f(x) ^ c) | ^ ^[/(O) + /(!)] < 1 and for d) we have 


\f (2 \x)\dx = k 2 [ \f(x)\d. 
) J o 


'x = yz = (1 — e = o(A r ). 


E Schur Complement 

In this section we prove Lemma 4.1. We use the following result proposed by Peng et al. [5]. 

Lemma E.l. [5, Lemma 4-3] If M is SPSD matrix and (l — e)(D — M) F D — M ■< (1 + e)(D — M) 
then it holds that (1 — e){D + M ) ■< D + M F (1 + s){D + M). 


We extend next two technical results on Schur complement that appeared in [4, 18, 22] 
Claim E.2. Suppose X = 

and M is symmetric matrix. Then v L [P 2 — MP(~ i M]v = min 


P 1 -M 
-M P 2 


where P\ and P 2 are symmetric positive definite matrices 

T 

T[d t\/t rt—l ; 




U 


for every v. 


Proof. Suppose f v {u) = 


Pi -M 
-M P 2 


= u T Piu — 2 u t Mv + v t P 2 v. Notice 


that / is minimized when u = P 1 l Mv, since V/„(u) = 2P\u — 2Mv. Hence, it follows that 
min,j f “ ^ '- T =»—• .dj/o-im, _ ,,Tfn ^n-i: 


= v t P 2 v - v T MPf ] Mv = v T [P 2 - MPf l M]v. 


Lemma E.3. (Schur Complement) Suppose D\,D 2 are positive main diagonal matrices and M,Q 


are symmetric matrices. If Xm — 
D 2 - MDf'M « e D 2 - QDf l Q. 


Di 

—M ' 


' D\ -Q ' 

_ -M 

D 2 


-Q D 2 


= Xq then it holds that 


Proof. Here we show the upper bound, but the lower bound follows by analogy. Let w be a vector 
such that ( ° ) Xq ( ™ ] = min n ( ^ J Xq f ^ J • We apply twice Claim E.2 to obtain the 


v ) ^ \ v 

following chain of inequalities 


v 


X X 

v T [D 2 - M D( l M]v = min f M X M 


w 

V 


IV 


< ( 1+e )^uJ ]=( 1 + £ ) min 

= (1 + e)u T [L> 2 -QL> 1 - 1 Q]u. 


X, 


Q 


We prove Lemma 4.1 by arguing in a similar manner to in [5, Lemma 4.4]. We present the 
proof here for completeness. 
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D -X 
-X D 

the upper bound, but the lower bound follows by analogy. Straightforward checking s’ 


Proof of Lemma f.l. For any symmetric matrix X , we denote by Vx = 


. We prove 
rows that 


V 


= u t Du — v T Mu — u t Mv + v t Dv 


(u + v) L (D — M){u + v) + {u — v ) 1 (D + M){u — v ) 


By Lemma E.l it holds that D + M ~ £ D + M and thus we have 
T 


V 


M 


1 

2 L 


(u + v) t (D — M)(u + v) + (u — v) t (D + M)(u — v) 


^ (1 + e)- [( u + v) t (D - M)(u + v) + (u - v) t (D 2 + M)(u - v)] 


— (1 + e) 


Vm 


Hence, it follows that V 
holds that D — MD~ 1 M ft 



D 

-M 


D 

-M ' 


-M 

D 


-M 

D 

D 

— MD~ 

_1 M. 





Vm- Now, by Lemma E.3 it 
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